当加强学习以稀疏的奖励应用时,代理必须花费很长时间探索未知环境而没有任何学习信号。抽象是一种为代理提供在潜在空间中过渡的内在奖励的方法。先前的工作着重于密集的连续潜在空间,或要求用户手动提供表示形式。我们的方法是第一个自动学习基础环境的离散抽象的方法。此外,我们的方法使用端到端可训练的正规后继代表模型在任意输入空间上起作用。对于抽象状态之间的过渡,我们以选项的形式训练一组时间扩展的动作,即动作抽象。我们提出的算法,离散的国家行动抽象(DSAA),在训练这些选项之间进行迭代交换,并使用它们有效地探索更多环境以改善状态抽象。结果,我们的模型不仅对转移学习,而且在在线学习环境中有用。我们从经验上表明,与基线加强学习算法相比,我们的代理能够探索环境并更有效地解决任务。我们的代码可在\ url {https://github.com/amnonattali/dsaa}上公开获得。
translated by 谷歌翻译
For applications that require processing large amounts of text at inference time, Large Language Models (LLMs) are handicapped by their limited context windows, which are typically 2048 tokens. In-context learning, an emergent phenomenon in LLMs in sizes above a certain parameter threshold, constitutes one significant example because it can only leverage training examples that fit into the context window. Existing efforts to address the context window limitation involve training specialized architectures, which tend to be smaller than the sizes in which in-context learning manifests due to the memory footprint of processing long texts. We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training. The key to the approach is to carve a long context into chunks (``windows'') that fit within the architecture, restrict the attention mechanism to apply only within each window, and re-use the positional embeddings among the windows. We test the PCW approach on in-context learning with models that range in size between 750 million and 178 billion parameters, and show substantial improvements for tasks with diverse input and output spaces. Our results motivate further investigation of Parallel Context Windows as a method for applying off-the-shelf LLMs in other settings that require long text sequences.
translated by 谷歌翻译